Apparatus for controlling access in a data processor

ABSTRACT

A data processor comprises an array of processor elements and a memory accessible by each processor element. An array of switching elements is provided, each associated with a processor element, the switching elements being interconnected to enable data to be transferred between different segments of memory associated with different processor elements or between different processor elements in the array.

FIELD OF THE INVENTION

[0001] The present invention relates to access control in a dataprocessor, and in particular but not limited to access control in asingle instruction multiple data (SIMD) processor.

BACKGROUND OF THE INVENTION

[0002] A typical single-instruction-multiple-data (SIMD) processor hasmultiple processor units each having its own associated memory space.The processor units are simple processes unable to fetch or interpretinstructions, and are controlled by a single control unit whereby theprocessor units act as slaves, performing at its request,arithmetic-logic operations. One advantage of this architecture is thatmore memory and processor units can be easily added to the computer.

[0003] An example of a SIMD processor is described in U.S. Pat. No.5,956,274 ('274b patent) issued on 21^(st) Sep. 1999 to Duncan G.Elliott, et al. In this architecture the processing units are placedwithin the memory, there being one processor unit per column of storageelements, each processor unit being directly coupled to the senseamplifier of each column, and whose output is coupled to the memorycolumn decoder. Each processor element is a single bit processor elementand is capable of processing serial data output from the memory columnto which it is coupled and associated. The disclosed structure allowsfor higher bandwidth communications between the memory and processingelements, allowing for a much high processing throughput as processingis not limited by the ability to provide data to the individualprocessing elements.

[0004] There are however aspects to the disclosed architecture thathamper its ability to be widely implemented. First, the structuredisclosed in the '274 patent implements a single row i.e. 1-D layout ofprocessing elements. Second, processing elements are coupled to andassociated with a single column of memory such that the processingelements in the '274 patent are only able to communicate with the columnor columns of memory with which they are coupled and associated.

[0005] In applications such as the processing of image, including video,it is desirable to have a high bandwidth of data from the memory. It isfurther desirable to have access to numerous portions of memory,including those with which a given processing element is not associated.It is also advantageous to implement an array of processing elementsi.e. a 2-D structure.

[0006] The tight integration of processing elements and memory asoutlined in the '274 patent generally makes it difficult to provide forcommunications between a two dimensional array of processing elements.It is further difficult to provide for communications between a givenprocessing element and the portions of memory with which it is notassociated. A communication network that implements 1 to 1communications links between a given processing element and all otherprocessing elements and all portions of memory is not practical, evenwith multi-layer metallization technology as is found in currentsemiconductor processing. Therefore there is a need for a communicationsbetween processing elements and memory without requiring 1 to 1 linksbetween elements and is implementable within a structure whereprocessing elements are integrated in memory.

SUMMARY OF THE INVENTION

[0007] According to one aspect of the present invention, there isprovided a data processor apparatus, comprising: a first processorelement and a second processor element, a memory having a first part anda second part, the first processor element being coupleable to the firstpart for at least one of read and write access, the second processorelement being coupleable to the second part for at least one of read andwrite access, and an access switch for selectively coupling the firstprocessing element to one of the first part, for at least one of readand write access and the second part, for at least one of read and writeaccess.

[0008] Advantageously, this arrangement enables a processor element toaccess a memory segment associated with another processor elementwithout the need to involve the processor element associated with theother memory segment in the data transfer, which therefore allows theassociated processor element to perform other functions, rather thanspend time/cycles transferring data to another processor element fromits memory segment. This arrangement not only provides any flexibilityof enabling data transfers between different memory segments and a givenprocessor element, but also significantly reduces the time required forthe transfer, and in embodiments of the present invention, datatransfers from non-local memory segments may be achieved in a singlecycle.

[0009] According to another aspect of the present invention there isprovided a data processor apparatus, comprising: a first processorelement and a second processor element, a memory having a first part anda second part, said first processor element being coupleable to saidfirst part for at least one of read and write access, said secondprocessor element being coupleable to said second part for at least oneof read and write access, and an access switch for selectively couplingsaid first processing element to one of said first part, for at leastone of read and write access, and said second part, for at least one ofread and write access.

[0010] According to another aspect of the present invention there isprovided a data processor apparatus comprising a plurality of processorelements, a memory having a plurality of parts, each different partbeing coupleable to a different one of said plurality of processorelements, and switch means for switch/coupling at least one of saidprocessor elements from its associated memory part to the memory partassociated with at least one other processing element.

[0011] According to another aspect of the present invention there isprovided a switching element for switchably coupling an array of circuitelements each having an input and an output, the switching elementcomprising: an input for coupling to the output of a circuit element, anoutput for coupling to an input of a circuit element, and first andsecond switch means, said first switch means having a first state inwhich said first port is coupled to said input port, and a second statein which said first port is coupled to said output port, and said secondswitch means having a first state in which said second port is coupledto said input port and a second state in which said second port iscoupled to said output port.

[0012] According to another aspect of the present invention there isprovided a data processor apparatus comprising a plurality of processorelements and a memory having a plurality of segments, each containing atleast one column of storage elements, and each segment having a memoryport, and wherein at least one memory port is coupled to at least twoprocessor elements.

[0013] According to another aspect of the present inventions there isprovided a data processor having a first memory block, a first array ofprocessor elements, each processor element being capable of accessingsaid first memory block, a first array of switching elements eachassociated with a respective processor element of said first array, asecond memory block and a second array of processor elements, each beingcapable of accessing said second memory block, a second array ofswitching elements each associated with a respective processor elementof said second array, wherein a corresponding switching element of saidfirst array is coupled to a corresponding switching element of saidsecond array.

[0014] According to another aspect of the present invention, there isprovided a data processor comprising an array of circuit elements and aswitching element associated with each of said circuit elements saidswitching elements being interconnected to enable data to be transferredbetween said circuit elements.

BRIEF DESCRIPTION OF THE DRAWINGS

[0015] Examples of embodiments of the present invention will now bedescribed with reference to the drawings, in which:

[0016]FIG. 1 shows a block diagram of a data processor according to anembodiment of the present invention;

[0017]FIG. 2 shows a diagram of an access switching arrangementaccording to an embodiment of the present invention;

[0018]FIG. 3 shows a schematic diagram of a data processor apparatusaccording to an embodiment of the present invention;

[0019]FIG. 4 shows a schematic diagram of a computational unit accordingto an embodiment of the present invention;

[0020]FIG. 5 shows a schematic diagram of a switching elementinterconnect scheme according to an embodiment of the present invention;

[0021]FIG. 6 shows a diagram of an array of computational unitsaccording to an embodiment of the present invention;

[0022]FIG. 7 shows a diagram of a two-dimensional array of switchingelements and an interconnect scheme according to an embodiment of thepresent invention;

[0023]FIG. 8 shows an embodiment of a switching element, which may beused in the interconnect scheme of FIG. 7;

[0024]FIG. 9 shows a diagram of a switching element, according toanother embodiment of the present invention;

[0025]FIG. 10 shows an array of interconnected switching elements of theembodiment shown in FIG. 8;

[0026]FIG. 11 shows a diagram of a switching element according toanother embodiment of the present invention;

[0027]FIG. 12 shows a diagram of an array of switching elements of theembodiment of FIG. 10;

[0028]FIG. 13 shows a diagram of a switching element according toanother embodiment of the present invention;

[0029]FIG. 14 shows Table 1, which contains the possible states of atri-stateable inverter;

[0030]FIG. 15 shows Table 2, which contains examples of various sets ofcontrol signals for controlling the embodiment of the switching elementshown in FIG. 13;

[0031]FIG. 16 shows a diagram of a switching element according toanother embodiment of the present invention;

[0032]FIG. 17 shows a diagram of a switching element according toanother embodiment of the present invention;

[0033]FIG. 18 shows a diagram of a switching element according toanother embodiment of the present invention;

[0034]FIG. 19 shows a diagram of a switching element according toanother element of the present invention;

[0035]FIG. 20 shows a block diagram of a data processor according to anembodiment of the present invention;

[0036]FIG. 21 shows a diagram of a data processor according to anotherembodiment of the present invention;

[0037]FIG. 22 shows a diagram of a switching element according to anembodiment of the present invention;

[0038]FIG. 23 shows a block diagram of a data processor according toanother embodiment of the present invention, and

[0039]FIG. 24 shows a data processor according to another embodiment ofthe present invention.

DESCRIPTION OF EMBODIMENTS

[0040] Embodiments of the current invention present aspects of acommunications network that allows for communications between a givenprocessing element and other processing elements, as may be found in aprocessor that implements a plurality of SIMD processing elements, andbetween a given processor and various regions of memory.

[0041] Referring to FIG. 1, a data processor 1, according to anembodiment of the present invention, comprises a plurality of processorelements (PE) 3, 5, 7 and a memory 9, which may for example comprise arandom access memory (RAM). The memory includes a plurality of segments11, 13, 15 and each segment has an associated input/output port 17, 19,21 to permit read/write access to a respective memory segment. Eachmemory segment may comprise a one dimensional array (e.g. a column) or atwo-dimensional array (e.g. containing a plurality of rows and columns)of one bit memory or storage elements. The memory may include a rowselector to select a particular row of memory elements and each segmentmay include a column selector to select a particular column of memoryelements and to connect that column to the I/O port, if each segmentcontains more than one column of memory elements.

[0042] The data processor 1 further includes a plurality of switchingelements (SE) 23, 25, 27. The first switching element 23 is coupled tothe I/O ports 17, 19 of the first and second memory segments 11, 13 andis switchable to selectively couple one of the first and second memorysegments 11, 13 to the first processor element 3. Similarly, the secondswitching element 25 is coupled to the I/O ports 19, 21 of the secondand third memory segments 13, 15 and is arranged to selectively coupleeither the second or third memory segment 13, 15 to the second processorelement 5. The third switching element 27 is coupled to the third memorysegment 15 and possibly to a fourth memory segment (not shown) and maybe arranged to selectively couple the third memory segment 15 or thefourth memory segment to the third processor element 7. The firstswitching element 23 may be arranged to provide at least one of readaccess by the first (i.e. its local) processor element to at least oneof the first (i.e. its local) memory segment 11 and to the second (i.e.its remote, e.g. neighbouring or more remote than neighbouring) memorysegment 13, and write access from the first processor element 3 to atleast one of the first and second memory segments 11, 13. Similarly, thesecond and/or third switching elements 25, 27 may be arranged to provideat least one of read access to at least one of its local and remotememory, and write access from the processor element to at least one ofits respective local and remote memories. The switching of eachswitching element 23, 25, 27 may be controlled by applying a controlsignal to a respective control signal input port 29, 31, 33.

[0043] Advantageously, the switching arrangement shown in FIG. 1, allowseach processor element direct access not only to its local memorysegment but also to the local memory segment of another processorelement. This enables each processor element to perform calculationsbased on data contained in its local and associated remote memorysegments. For example, in one embodiment, each processing elementincludes first and second local registers for storing data from itslocal memory and data from its associated remote memory, respectively,and an arithmetic logic unit (ALU) for performing a calculation based onthe contents of these registers. This arrangement is particularlybeneficial for image processing, where a comparison is made between thevalue of one pixel and that of another pixel, for example a neighbouringpixel for motion estimation and/or data compression.

[0044] In one embodiment, each of the switching elements may becontrolled to provide the same memory segment to processor elementcoupling at the same time. For example, each switching element may firstbe controlled to couple each processor element to its local memorysegment for read access. Secondly, each switching element may becontrolled to couple each processor element to its respective remote orneighbouring memory segment for read access. Thirdly, each processorelement may be controlled to perform an operation based on the data fromits local and associated remote memory segment and subsequently tooutput the result of the operation. Each switching element may then becontrolled to write the result of the operation into either its local orremote memory segment. Advantageously, controlling all of the switchingelements to perform the same switching operation avoids memory segmentand processor element access conflicts and enables all of the switchingelements to be controlled simultaneously by the same control signal orinstruction. This form of processing is particularly applicable todigital image processing and allows image pixels to be processed inparallel.

[0045] The embodiment shown in FIG. 1, illustrates an example of aswitching arrangement in which each processor element has access eitherto its local memory segment or to the memory segment to its right (i.e.East). In another embodiment, each switching element may be arranged toprovide each processor element access to its local memory segment or toa remote memory segment to its left (i.e. West). In a furtherembodiment, each switching element may be arranged to permit eachprocessor element selective access to either its local memory segment, aremote (e.g. neighbouring) memory segment to its right (East), or to aremote (e.g. neighbouring) memory segment to its left (West). An exampleof such a switching arrangement is shown in FIG. 2.

[0046] Referring to FIG. 2, the switching arrangement 101 according toan embodiment of the present invention, comprises a plurality ofswitching elements 103, 105, 107. Each switching element includes afirst port 109, 111, 113, which is coupled to the output of a selectorswitch (SL1), which selectively couples the first port either to thelocal memory (segment) associated with each switching element or thelocal processor element associated with each switching element, theswitching being controlled by a control signal applied to a controlsignal input port 115, 117, 119. Each switching element furthercomprises a second port 121, 123, 125 (located to the left, or west, ofeach switching element in this embodiment), and a third port 127, 129,131 (situated to the right or east of each switching element, in thisembodiment). The east port 127 of the first switching element 103 iscoupled to the west port 123 of the second switching element 105, andthe east port 129 of the second switching element 105 is coupled to thewest port 125 of the third switching element 107. Each switching element103, 105, 107 further includes first and second switches SW1, SW2, thefirst switch SW1 being arranged to couple/de-couple the first port 109,111, 113 of each switching element to the west port 121, 123, 125, andthe second switch SW2 being arranged to couple/de-couple the first port109 to the east port 127, 129, 131 of each switching element.

[0047] Each switching element 103, 105, 107 further includes a secondswitch selector SL2 connected to each of the first, second and thirdports of each switching element, and which is arranged to selectivelycouple one of the first, second and third ports to a fourth port 133,135, 137, under the control of a control signal applied to its controlsignal input port 139, 141, 143. Each of the fourth ports 133, 135, 137are coupled to a third selector switch SL3 for selectively switching theoutput from the second selector switch either to the local memoryassociated with the switching element or to the local processor elementassociated with the switching element. Switching of each selector switchSL3 is controlled by a control signal applied to its control signalinput port 145, 147, 149.

[0048] The switching arrangement 101 shown in FIG. 2 is configurable topermit data transfer between local memory and local processor element,or between the local memory or local processor element associated withone switching element and the local memory or local processor elementassociated with a neighbouring switching element to which it is coupled.Examples of various modes of operation of the switching element 101 willnow be described.

[0049] According to a first mode of operation each switching element isconfigured to permit its local processor element to read from its localmemory. In this mode, selector switch SL1 is controlled to couple eachof the first ports 109, 111, 113 to a respective local memory, switchesSW1 and SW2 are both open (as shown), the second selector switch SL2 iscontrolled to couple the first port 109, 111, 113 to the fourth port133, 135, 137, and the third selector switch SL3 is controlled to couplethe fourth port 133, 135, 137 to the local processor element.

[0050] In a second mode of operation, each switching element isconfigured to write from its local processor element to its localmemory, in which case the first selector switch SL1 is controlled tocouple the output of its local processor element to the first port 109,111, 113, the first and second switches SW1, SW2 are both open, thesecond selector switch SL2 is controlled to couple the first port 109,111, 113 to the fourth port 133, 135, 137, and the third selector switchSL3 is controlled to couple the fourth port 133, 135, 137 to the localmemory.

[0051] In a third mode of operation, each switching element isconfigured to transfer data from its local memory to the local processorelement of the switching element to its right (east). In this mode, thefirst switching element SL1 is controlled to couple the first port 109,111, 113 to the local memory, the first switch SW1 of each switchingelement is open, the second switch SW2 of each switching element isclosed to connect the first port 109, 111, 113 to a respective east port127, 129, 131, the second selector switch SL2 is controlled to couple arespective west port 121, 123, 125 to the fourth port 133, 135, 137. Inthis configuration, the first port 109, 111, 113 of each switchingelement is connected to the fourth port 135, 137 of the neighbouringswitching element to its right. Finally, the third switch selector SL3is controlled to couple the fourth port 133, 135, 137 to its localprocessor element.

[0052] In another mode of operation, each switching element isconfigured to write from its local processor element into the localmemory associated with a switching element to its right (east). In thiscase, each switching element is configured in a similar way to thatdescribed immediately above in connection with a memory read access,except that the first selector switch SL1 is controlled to couple thelocal PE to the first port 109, 111, 113, and the third selector switchSL3 is controlled to couple the fourth port 133, 135, 137 to the localmemory.

[0053] In another mode of operation, each switching element isconfigured to transfer data from its local memory to a processor elementassociated with a switching element to its left (west). In this mode,each first selector switch SL1 is controlled to couple the first port109, 111, 113 to the local memory, the first switch SW1 is closed tocouple the first port 109, 111, 113 to the left port 121, 123, 125, andthe second switch SW2 is open. The second selector switch SL2 iscontrolled to couple the right port 127, 129, 131 of each switchingelement to the fourth port 133, 135, 137, so that the first port of eachswitching element is effectively coupled to the fourth port of theswitching element to its left. Finally, the third selector switch SL3 iscontrolled to couple the fourth port 133 to the local processor element.

[0054] In another mode of operation, each switching element may beconfigured to write the output of its local processor element to thelocal memory associated with the switching element to its left (west).This mode is similar to that described immediately above in connectionwith a west read access, except that the first selector switch SL1 iscontrolled to couple the first port 109, 111, 113 to the output of itslocal processor element, and the third selector switch SL3 is controlledto couple the fourth port 133, 135, 137 to its local memory.

[0055] In each of the switching modes described above, each switchingelement has the same configuration, and therefore conveniently, the sameset of control signals can be applied to the switches SW1, SW2 and theselector switches SL1, SL2, SL3 of each switching element. Therefore,the control lines for corresponding switches and selector switches canbe connected together, substantially simplifying and reducing the numberof control wires which would otherwise be required if each element wascontrolled independently of the others.

[0056] Any number of switching elements may be used to transfer databetween any number of processor elements or their respective memorysegments. Generally, one switching element is required for eachprocessor element.

[0057] Byte-Wise Processing and Data Transfer

[0058] In one embodiment, each processor element is capable ofprocessing a single bit, and a plurality of processor elements may befunctionally grouped together to process a multiple bit word, forexample, a byte (8 bits). A group of processor elements which functionto operate on a multiple bit word will be referred to as a computationalunit (CU).

[0059]FIG. 3 shows an example of a data processor having a plurality ofcomputational units, and a switching arrangement which allows thetransfer of multiple bit words between computational units and/orbetween computational units and the local memories associated with othercomputational units (and optionally between memory segments).

[0060] Referring to FIG. 3, the data processor 201 comprises a pluralityof computational units (CU) 203, a memory block 205 and a plurality ofgroups of switching elements (SEs) 207. In this embodiment, thecomputational units are arranged in a two-dimensional array having tworows, row 1 and row 2 and n columns. Each computational unit has 8processor elements allowing the computational unit to perform byteprocessing, and an associated group of switching elements 207, eachcontaining 8 switching elements, one switching element for eachprocessor element. This configuration is shown in FIG. 4, in which acomputational unit 203 includes 8 processing units 211, each connectedto a switching element 213. Returning to FIG. 3, each computational unit203 has 8 memory I/O ports associated therewith, one for each processorelement of the computational unit, and in which a respective I/O port iscoupleable to a respective processor element through a respectiveswitching element. Thus, in this embodiment, each 8 bit computationalunit is pitch-matched to the I/O ports of the memory. In the arrangementshown in FIG. 3, the first set of 8 I/O ports 210 and each alternate setof 8 I/O ports is associated with a successive computational unit 203 inthe first row, and the second set of 8 I/O ports 212 and each alternateset of 8 I/O ports is associated with each successive computational unitin the second row.

[0061] Each group of 8 switching elements 207 is connected to anadjacent group of switching elements to allow byte transfer of data fromone group to an adjacent group, as shown in FIG. 5.

[0062] Referring to FIG. 5, four groups of switching elements 215, 217,219, 221 are shown, in which the first two groups 215, 217 occupy afirst row 223 and the third and fourth groups 219, 221 occupy a secondrow 225. Each group contains eight switching elements labelled 0 to 7.Each group has an associated computational unit and an associated localmemory (not shown), wherein each local memory is capable of storing atleast one byte of data (containing 8 bits), and each computational unitis capable of byte-wise processing.

[0063] In order to perform byte-wise transfer between a computationalunit or local memory associated with one group of switching elements anda computational unit or local memory associated with an adjacent groupof switching elements, each switching element of one group is connectedto a switching element of the adjacent group which has a correspondingbit significance. Thus, as shown in FIG. 5, switching element 0 of thefirst group 215 is directly connected to switching element 0 of thesecond, adjacent group 217 (when position 0 may correspond to the leastsignificant bit (LSB)). Similarly, each of the other switching elements1 to 7 of the first group 215 is connected to a corresponding switchingelement 1 to 7 of the second group 217 (where position 7 may correspondto the most significant bit (MSB)). These connections allow east-west orwest-east byte-wise transfer between the local memory or computationalunit associated with the first group of switching elements 215 and thesecond group of switching elements 217. The switching elements of thethird and fourth groups 211, 221 are connected in like manner to permiteast to west or west to east byte-wise transfer between thecomputational unit or local memory associated with the third group ofswitching elements 219 and the computational unit or local memoryassociated with the fourth group of switching elements 221.

[0064] In the embodiment shown in FIG. 5, the first and third group ofswitching elements 215, 219 are also connected to permit north-south orsouth-north byte-wise transfer between the computational unit or localmemory associated with the first group of switching elements 215 and thecomputational unit or local memory associated with the thirdcomputational unit 219. To permit this byte-wise transfer, eachswitching element associated with the first group 215 is connected to aswitching element of the third group 219 which has corresponding bitsignificance. Thus, the switching element 0 of the first group 215 isdirectly connected to the switching element 0 of the third group 219,and each of the other switching elements 1 to 7 of the first group 215is directly connected to a corresponding switching element 1 to 7 of thethird group, as shown by the arrows 224. The second and fourth groups ofswitching elements 217, 221 are also coupled to permit north to southand south to north byte-wise transfer between the computational unit orlocal memory associated with the second group of switching elements 217and the computational unit or local memory associated with the fourthgroup of switching elements 221. Again, to enable this byte-wisetransfer, each of the switching elements of the second group 217 isconnected to a switching element of the fourth group 221 having the samebit significance. Thus, the switching element 0 of the second group 217is directly coupled to the switching element 0 of the fourth group 221and each of the other switching elements 1 to 7 of the second group 217is directly coupled to a corresponding switching element 1 to 7 of thefourth group 221, as shown by the arrows 226.

[0065] It can be seen from the embodiment shown in FIG. 5, that theswitching arrangement allows byte-wise transfer between thecomputational unit or local memory associated with one group ofswitching elements and the computational unit or local memory associatedwith an adjacent group of switching elements displaced in the east/westdirection and a group of switching elements displaced in the north/southdirection. The array of groups of switching elements can be extendedlimitlessly to any size of array so that byte-wise data transfer can bepermitted between a computational unit and/or local memory associatedwith any group of switching elements and the computational unit and/orlocal memory associated with any other adjacent switching element in thesame row or the same column. This arrangement is particularlyadvantageous in video image processing as it allows a pixel, which istypically described by an 8 bit word to be compared with each of itsnorth, south, east and west neighbours, and each comparison can beperformed for all pixels in parallel. Thus, for example, one paralleloperation may involve the transfer of a byte of data from the memoryassociated with each group of switching elements to the computationalunit associated with its adjacent southern neighbour, for comparisonwith a byte stored in the memory of its southern neighbour. Anotherparallel operation may involve the transfer of a byte from a localmemory associated with each group of switching elements into thecomputational unit associated with its immediate northern neighbour forcomparison with a byte stored in the memory associated with its northernneighbour. Other parallel operations including for example east and westbyte-wise transfers may also be performed. In addition to performingnorth to south (or south to north) and east to west (or west to east)byte-wise transfers, it may also be desirable to perform byte-wisetransfers between neighbouring groups of switching elements on thediagonal. An example of a two dimensional array of computational unitsin which byte-wise transfers between a computational unit and itsnearest eight neighbours is shown in FIG. 6.

[0066]FIG. 6 shows a two-dimensional, three-by-three array 251 of ninecomputational units, in which the central computational unit 253 has 8nearest neighbours, which include its adjacent computational units 255to the north and 257 to the south in the same column, its adjacentcomputational units 259, to the west, and 261 to the east in the samerow, and its immediately adjacent computational units 263 to thenorth-west and 265 to the south-east on one diagonal, and itsimmediately adjacent computational units 267 to the north-east and 269to the south-west, on the other diagonal. In order to permit datatransfer between the central computational unit 253 or its local memoryand the computational unit or local memory of each of its 8 nearestneighbours, each computational unit and local memory has an associatedswitching element. In one embodiment, the switching element associatedwith the central computational unit 253 may be connected to each of theswitching elements associated with each of its nearest neighbours andeach switching element may have 8 input-output ports or buses extendingtherefrom to connect to each of its neighbours, as shown schematicallyby the arrows 271 in FIG. 6. An example of an interconnect scheme inwhich each switching element is connected to each of its nearestneighbours by one of eight buses, is shown in FIG. 7. FIG. 7 shows atwo-dimensional, three-by-three array 273 of switching elements 275, inwhich each switching element has 8 ports or buses 277 extendingtherefrom for connecting to each of its nearest neighbour switchingelements (for example to a bus or port of each element).

[0067]FIG. 8 shows an embodiment of a switching element which may beimplemented in the interconnect scheme of the embodiment shown in FIG.7. The switching element comprises a first multiplexer 240 having nineinputs 242 and an output 244. Eight of the inputs are for coupling toneighbouring switching elements and the remaining input is for couplingto the local unit, for example a memory or processor element output. Theswitching element further includes a second multiplexer 246 having afirst input 248 coupled to an output port of the local PE, and a secondinput port 250 for receiving data from the local unit, e.g. the localmemory. The output of the second multiplexer is broadcast to otherswitching elements. The switching element further includes a thirdmultiplexer having a first input connected to the output of the firstmultiplexer, and a second input coupled to an output of the localprocessor element. The output of the third multiplexer is coupled to thelocal memory input.

[0068] An embodiment of a switching arrangement which advantageouslyallows the number of buses to be reduced, while still retaining theconnectivity of a local memory and/or processor element to its nearest 8neighbours will now be described below. Referring to FIG. 9, a switchingelement 301 has four input/outputs 303, 305, 307, 309. In thisembodiment, the first and third output ports 303, 307 lie on a firstnorth to south axis and the second fourth input/output port 305, 309 lieon a transverse axis, which in this embodiment is a substantiallyorthogonal, east to west axis which enables other similar switchingelements to be connected to each of the input/output ports in atwo-dimensional grid pattern (which will be described below), althoughother arrangements are also possible.

[0069] The switching element 301 further includes four switches 311,313, 315, 317, the first switch 311 being connected to the firstinput/port 303, the second switch 313 being connected to the secondinput/output port 305, the third switch 315 being connected to the thirdinput/output port 307 and the fourth switch 317 being connected to thefourth input/output port 309. Each switch is switchable between threeterminals 319, 321, 323, and further includes at least one neutralposition in which the switch is not connected to any of the threeterminals. The switching element 301 further includes an input port 325which is connected to each middle terminal 321 of the four switches.Each one of the other two terminals of each switch is connected to theterminal of an orthogonal switch on the same side. Thus, the firstterminal 319 associated with the first switch 311 is connected to thethird terminal 323 of the fourth switch 317. The third terminal 323associated with the first switch 311 is connected to the first terminal319 of the second switch 313. Similarly, the first terminal 319 of thethird switch 315 is connected to the third terminal 323 of the secondswitch 313, and the third terminal 323 of the third switch 315 isconnected to the first terminal 319 of the fourth switch 317.

[0070] The switching element further includes an output selector switch327 having four inputs A, B, C and D and an output 329. A respective oneof the four inputs A, B, C and D is connected to a respective one of theinterconnected switch terminal couples 319, 323 also labelled A, B, Cand D (although these interconnections are not shown for clarity). Thisembodiment further includes a second selector switch 341 having twoinputs 343, 345 and an output 347. The first input 343 is connected tothe output 329 of the first selector switch 327 and the second input 345is connected to the input port 325 of the switching element (althoughthis connection is also not shown for clarity).

[0071] The switching element 301 is capable of directing data receivedat the input 325 either to local elements (e.g. local memory or localprocessor element) associated with the switching element, or to one ofthe four input/output ports 303, 305, 307, 309. The input port 325 maybe coupled to receive data from its associated local memory, or theoutput of its associated local processor element, and a switch may beprovided to selectively couple the input port 325 to one of theassociated local memory and an output of the associated processorelement.

[0072] The switching element 301 is also capable of directing datareceived at any one of its four input/output ports 303, 305, 307, 309 tothe output port 329 of the first selector switch 327 and to the secondoutput port 347 of the second selector switch 341, depending on itsstate. The switching element is also capable of transferring datareceived at the input port 325 to the output port of the second selectorswitch 347 again depending on its state. In one embodiment, the output347 of the second selector switch 341 is connected to the input of thelocal processor element associated with the switching element 301. Thus,in one mode of operation, the switching element is capable oftransferring data received at its input 325 to its output 347, forexample to enable the transfer of data from its local memory to itslocal processor element. In another mode of operation, the switchingelement is capable of transferring data received at any one of the fourinput/output ports to its associated local processor element. Examplesof the various possible operating modes of the switching element 301will now be described below.

[0073] To perform a data transfer to the south, the third switch 315 iscontrolled to couple the input port 325 to the third (south)input/output port 307 so that data from the local unit (e.g. localmemory or local processor) is transferred to the south port 307. At thesame time, the first switch 311 is controlled to couple the first(north) input/output port 303 either to its first terminal 319 or to itsthird terminal 323, thereby coupling the north port 303 either to thefirst input A or the second input B of the first selector switch 327.The second selector switch 327 is controlled to couple the input port towhich the first switch 311 is connected, to the output port 329. Thesecond selector switch 341 is controlled to connect its first input port343 to the output port 347, which is connected to an input of the localunit (e.g. memory or local processor) associated with the switchingelement 301. Thus, in this mode of operation, the switching element isconfigured to transfer data from the local unit to the south port 307and to transfer data into the local unit via the north port 303.

[0074] In another mode of operation, the switching element 301 may becontrolled to transfer locally derived data to its north neighbour andsimultaneously to receive data derived from its southern neighbour. Inthis mode, the first switch 311 is controlled to connect the northinput/output 303 to the input port 325 of the switching element 301, andthe third switch 315 is controlled to connect the south port 307 to oneof the first and third terminals 319, 323 associated with the thirdswitch 315, to connect the south port 307 to one of the third or fourthinputs C, D of the first selector switch 327, and the second selectorswitch is controlled to output the data to the local unit.

[0075] In another mode of operation, the switching element may beconfigured to transfer locally derived data to its east neighbour and toreceive data from its west neighbour. In this mode of operation, thesecond switch 313 is controlled to couple the input 325 of the switchingelement 301 to the east port 305, and the fourth switch 317 iscontrolled to connect the west port 309 to one of the first or thirdterminals 319, 323 associated with the fourth switch 317 to couple thewest port 309 to one of the first and third input ports A, C of thefirst selector switch 327. The first selector switch 327 is controlledto connect the appropriate input port A, C to its output port 329 andthe selector switch 341 is appropriately controlled to output the datareceived at the west port 309 to the local unit.

[0076] In a fourth mode of operation, the selector switch 301 isconfigured to transfer data to its west neighbour and to receive datafrom its east neighbour. In this mode, the fourth switch 317 iscontrolled to couple the input port 325 of the switching element 301 tothe west port 309, and the second switch 313 is controlled to couple theeast port 305 to one of the first and third terminals 319, 323associated with the second switch 313, so that the east port 305 iscoupled to one of the second and fourth inputs B, D of the firstselector switch 327. The first and second selector switches 327, 341 areappropriately controlled so that data received at the east port 305 istransferred to an input of the local unit (e.g. local memory or localprocessor. element).

[0077] In each of the modes described above for a north to south, southto north, east to west and west to east transfer, the switching element301 assumes both an output mode for transferring data out from one ofits north, south, east or west ports and an input mode for receivingdata at the respective opposite port. The switching element 301 can alsobe configured to operate in any one of four further modes to allow thetransfer of data received at any one of its input/output ports 303, 305,307, 309 to an adjacent orthogonal input/output port 303, 305, 307, 309for transferring data between diagonally disposed nearest neighbours. Toperform a diagonal data transfer, each switching element 301 isconfigured to operate in an input mode, for receiving data at one of itsinput/output ports 303, 305, 307, 309 and transferring the received datato the local unit, an orthogonal output mode, in which locally deriveddata is output at a port which is orthogonal to the input/output portestablished for the input mode, and a by-pass or pass-through mode, inwhich the other two adjacent orthogonal input/output ports areinterconnected to allow the transfer of data received at one of theseports to the other remaining port. The switching element 301 is capableof adopting all three modes simultaneously to enable diagonal (e.g. NE,NW, SE, SW) nearest neighbour or more remote diagonal data transfers.

[0078] An example of how switching elements may be configured to enablea diagonal data transfer will now be described with reference to FIG.10.

[0079]FIG. 10 shows an array of three switching elements 351, 353, 355,each of which is similar to the switching element 301 shown in FIG. 9,and like parts are designated by the same reference numerals. Theswitching elements are interconnected such that the east input/outputport 305 of the first switching element 351 is connected to the westinput/output port 309 of the second switching element 353, and the northinput/output 303 of the second switching element 353 is connected to thesouth input/output port 307 of the third switching element 355. Thus,the second switching element 351 constitutes the east neighbour (whichmay or may not be the nearest east neighbour) to the first switchingelement 351, and the third switching element 355 constitutes theNortheast neighbour (which may or may not be the nearest Northeastneighbour) to the first switching element 351. It is to be noted thatthe array of three switching elements shown in FIG. 10 may be part of alarger array of switching elements and that FIG. 10 simply serves toillustrate an arrangement of switching elements which enable a transferbetween one switching element and another switching element on itsNortheast diagonal.

[0080] To perform a transfer of data locally derived at the thirdswitching element 355 to the first switching element 351, the thirdswitching element 355 assumes an output mode, in which the third switch315 is controlled to connect the input port 325 of the third switchingelement 355 to its third (south) input/output port 307, so that locallyderived data is transferred onto the output port 307. The secondswitching element 353 assumes a pass-through mode, in which the firstswitch 311 is connected to its associated first terminal 319, and thefourth switch 317 of the second switching element 353 is connected toits associated third terminal 309, thereby connecting the first (north)input/output port 303 of the second switching element to its fourth(west) input/output port 309. The first switching element 351 assumes aninput mode, in which the second switch 313 of the first switchingelement 351 is controlled to connect the second (east) input/output port305 to the third terminal 323 associated with its second switch 313 sothat the east input/output port 305 is connected to the fourth inputport D of the selector switch 327. The selector switch 327 is controlledto connect the fourth input D to its output port 329 (not labelled onFIG. 10). Thus, it can be seen that the first, second and thirdswitching elements are configured to provide a continuous path from theinput port 325 of the third switching element 355 to the output port 329of the first switching element 351.

[0081] It is to be noted in this operating mode that each of theswitching elements 351, 353, 355 may all assume the same three operatingmodes: output mode, pass-through mode and input mode, so that eachswitching element assumes an output mode in which a respective inputport 325 is connected to a respective south port 307, a pass-throughmode, in which the first and fourth switches are controlled to couple arespective north port 303 to a respective west port 309, and an inputmode, in which a respective east port 305 is connected to a respectiveoutput port 329. This symmetrical configuration of all of the switchingelements allows same direction, parallel, diagonal transfers betweendiagonally disposed switching elements in the array. Since eachswitching element is configured in the same way to perform a givendiagonal transfer, all of the switching elements may be configured bythe same set of control signals, which enables the control of datatransfers between the switching elements to be considerably simplified.

[0082] Advantageously, as each switching element is designed to enabledata to be transferred from any of its input/output ports to either ofits adjacent, orthogonal input/output ports, an effective diagonal datatransfer can be performed using the same buses which are used fornorth-south and east-west data transfers, thereby obviating the need foradditional diagonal buses between diagonally disposed switchingelements. Thus, the switching elements allow data transfer between agiven switching element and its nearest eight neighbours using only fourbuses rather than eight, thereby considerably simplifying and reducingthe cost and complexity of the additional interconnects, which wouldotherwise be required. This arrangement also alleviates the problem ofsuperimposing or combining a diagonal interconnect system to a“Manhattan geometry”, in which the interconnect wires are drawnorthogonally, i.e. in the north, south, east and west directions.

[0083] In the embodiments of the switching elements shown in FIGS. 9 and10, each input mode from any of the input/output ports 303, 305, 309,311 has two possible configurations. The first configuration isillustrated in FIG. 10, in which for the input mode associated with thetransfer of data into the switching element from the east input/outputport 305, the second switch 313 is connected to the third terminal 323associated with the second switch, so that the east input/output port isconnected to the fourth port D of the selector switch 327. In thealternative configuration of this input mode, the second switch 313 maybe controlled to couple to the first terminal 319 associated with thesecond switch 313 so that the east input/output 305 is connected to thesecond input port B of the selector switch 327, in which case theselector switch is controlled to connect the second input port B to itsoutput port 329.

[0084] The embodiments of the switching elements shown in FIGS. 9 and 10allow any diagonal transfer to be performed via all possible routes. Inthe case of a transfer to a switching element from a Northeastneighbour, one route is to pass the data via its east neighbour, asshown in FIG. 10, and the second route is to pass the data via its northneighbour, for example north neighbour 357 shown schematically in FIG.10.

[0085] In another embodiment, each of the switching elements may besimplified to permit a diagonal transfer using only one of these twopossible routes. An example of another embodiment of a switching elementis shown in FIG. 11, and an example of an array of such switchingelements is shown in FIG. 12. The switching element shown in FIG. 11 issimilar to that shown in FIG. 9, and like parts are designated by thesame reference numerals. The switching element 301 comprises fourinput/output ports 303, 305, 307, 309 having associated switches 311,313, 315, 317. Each switch has first and second terminals 319, 321, thefirst terminal 319 (also labelled N, E, S and W) being connected to arespective one of four input terminals, also labelled N, E, S, W of afirst selector switch 327. The first terminal 319 associated with eachswitch is also connected to the input/output port connected to itsadjacent switch displaced in the clockwise sense. Thus, the firstterminal 319 associated with the first switch (connected to the first(north) input/output port 303) is connected to the second (east)input/output port 305, the first terminal 319 associated with the secondswitch 313 is connected to the third (south) input/output port 307, thefirst terminal 319 associated with the third switch 315 is connected tothe fourth (west) input/output port 309, and the first terminal 319 ofthe fourth switch 317 is connected to the first (north) input/outputport 303. The second terminal 321 of each switch is connected to theinput port 325 of the switching element.

[0086] Each switch is switchable between its associated first and secondterminals 319, 321, and a third, neutral or floating state (e.g. betweenthe two terminals), as shown in FIG. 11.

[0087]FIG. 12 shows an array of four switching elements of the kindshown in FIG. 11 and will be used to illustrate a diagonal transfer froma Northeast neighbour. The switching elements are arranged such that thenorth port 303 of the first switching element 371 is connected to thesouth port 307 of the second switching element (north neighbour) 373,and the east port 305 of the second switching element 373 is connectedto the west port 309 of the third switching element 375, and whichconstitutes the Northeast neighbour of the first switching element 371.

[0088] To transfer data from the third switching element 375 to thefirst switching element 371, the third switching adopts an output modein which the fourth switch 317 couples the input port 325 to the fourth(west) input/output 309, the second switching element 373 adopts apass-through mode, in which the second switch 313 connects the second(east) port 305 to the third (south) input/output port 307, and thefirst switching element 371 adopts an input mode, in which the firstswitch 311 connected to the first (north) input/output port 303 is inits neural (or floating) position and the selector switch 327 iscontrolled to couple the first input port N to its output port. In thisway, data can be transferred from the local unit (e.g. local memory orprocessor element) associated with the third (Northeast) switchingelement 375 to the local unit (e.g. local memory or processor element)associated with the first switching element 371.

[0089] It is to be noted that the output mode adopted by the thirdswitching element 375, the pass-through mode adopted by the secondswitching element 373 and the input mode adopted by the first switchingelement 371 may all be adopted by all switching elements, as shown inFIG. 12. This enables data to be transferred to each of the second andthird switching elements from their respective Northeast neighbours, inan extended array.

[0090] It is to be noted that the array shown in FIG. 12 can be extendedto any size, and that the switching elements enable data to betransferred in parallel in any of N, S, E, W, NE, NW, SE, SW directions.

[0091] Embodiments of the switching element may be implemented using anysuitable components, and fabricated using any suitable technology (e.g.CMOS and/or passgate). For example, the switching element may compriseany suitable logic circuitry which can be controlled to perform therequired function. An example of one implementation of the switchingelement of FIG. 11 is shown in FIG. 13, where corresponding parts aredesignated by the same reference numerals.

[0092] Referring to FIG. 13, a switching element 301 has first, second,third and fourth input/output ports 303, 305, 307, 309, each of whichhas an associated switch 311, 313, 315, 317, as for the embodiment ofFIG. 11. Each switch comprises a 2 to 1 multiplexer 350 having a firstinput port 352 connected to the input port 325 of the switching element,and a second input port 354 connected to the first terminal 319 of eachswitch. Each switch further comprises a tri-stateable inverter 356 whoseinput 358 is coupled to the output port 360 of the 2 to 1 multiplexer350. The output port 362 of the tri-stateable inverter is connected to arespective input/output port 303, 305, 307, 309, with which the switchis associated.

[0093] The switching element 301 further includes first, second, thirdand fourth inverters 362, 364, 366, 368, each having an input port 370connected a respective input/output port 303, 305, 307, 309, and anoutput port 372 connected to a respective first terminal 319 of theadjacent switch to its left, so that, for example, the output port 372of the inverter connected to the north port 303 is connected to thefirst terminal of the fourth switch 317.

[0094] The switching element 301 also comprises a first selector switch327, which may comprise a multiplexer, having four inputs labelled N, E,S and W which are connected to a respective terminal N, E, S and Wcoupled to a respective input/output port 303, 305, 309, 311 via aninverter 302. The first selector switch 327 has two control signal inputports 328, 330 for reviewing a control signal for selectively switchingone of the inputs N, E, S, W to the output 329. In one embodiment, theoutput of the first selector switch 327 could be connected to the localprocessor element associated with the switching element. In anotherembodiment, the output of the first selector switch 329 could be coupledto the local memory associated with the switching element. In anotherembodiment, a further selector switch (not shown) may be provided toselectively couple the output of the first selector switch 327 to one ofan input to the local memory and an input to the local processorelement. In the embodiment shown in FIG. 13, a second selector switch341 is optionally provided, which may also comprise a multiplexer, andhas first and second input ports 343, 345, the first input port 343being connected to the output port 329 of the first selector switch 327,and the second input port 345 being connected to the input port 325 ofthe switching element 301. The output port of the selector switch 341may be coupled to an input of the processor element associated with theswitching element 301. The second selector switch 341 has a controlsignal input port 342 for selectively connecting the output port to oneof the first and second input ports 343, 345, in response to a controlsignal. The second selector switch enables data to be transferred eitherlocally between the local memory and the local processor element or fromone of the input/output ports of the switching element into the localprocessor element.

[0095] The 2 to 1 multiplexer 350 of each switch has a control signalinput port 374 for receiving a control signal which selectively couplesits output port 360 to one of the input port 325 of the switchingelement 301 and the input/output port which is adjacent and displaced ina clockwise direction relative to the input/output port to which theoutput of the 2 to 1 multiplexer is connected through its respectivetri-stateable inverter 356.

[0096] The tri-stateable inverter 356 of each switch 311, 313, 315, 317has a control signal input port 376 for receiving a control signalwhich, with the signal at the input port controls the signal output fromthe inverter, according to the table shown in FIG. 14. As can be seenfrom the table, the tri-stateable inverter can assume a neutral floatingposition, which implements the neutral or floating position of eachswitch of the switching element shown in FIG. 11. In this embodiment,the tri-stateable inverter is controlled to adopt this state when thecontrol signal is low (e.g. zero) as shown in the table. When thecontrol signal is in the other state (i.e. high or one) thetri-stateable inverter functions as a simple inverter, by inverting theinput signal.

[0097] The inverters are arranged such that on transferring data betweenone switching element and any one its eight neighbours (i.e. N, E, S, W,NE, SE, SW and NW), the data always passes through a minimum of twoinverters and, always passes through an equal number of inverters tomaintain signal polarity. Advantageously, passing binary data through aninverter pair enhances the signal strength and signal definition, andalso enhances the definition of the high to low or low to hightransition edge, and thus functions as a repeater.

[0098] In an array of switching elements of the kind shown in FIG. 13,data which is transferred between one switching element and any one ofits north, south, east or west neighbours, passes through two inverters,one being provided by a tri-stateable inverter 356 on passing the datafrom the input port 325 to one of the input/output ports 303, 305, 307,309, and the other being provided by a simple inverter 364 on passingthe data to the opposite input/output port to the output port 329 of theneighbouring switching element.

[0099] In transferring data between one switching element and aneighbouring switching element on a diagonal (e.g. NE, SE, SW, or NW),the data passes through four inverters, the first being provided by atri-stateable inverter 356 between the input port 325 of one switchingelement and the selected input/output port 303, 305, 307, 309, thesecond and third being provided by a simple inverter 362 and atri-stateable inverter 356, respectively, between adjacent orthogonalports of the intermediate switching element, which effectively turns thedata through 90°, and the fourth being provided by the simple inverter364 between the input/output port 303, 305, 307, 309 and the output port329 of the destination switching element.

[0100] In the embodiment of the switching element 301 shown in FIG. 13,a total of 11 (SOSO (bar), REN, REN(bar), UEN, UEN(bar), S1, S2, Den)control signals are required to control all of the switching components.However, the switching element may be implemented such that the controlsignals required for certain switching components can always be theinverse of that required for other switching components. The number ofexternally applied control signals can be reduced by providing localinversion. The switching element shown in FIG. 13 can be controlled toassume any configuration required for transferring data between itseight neighbours using just six control signals.

[0101] In the implementation shown in FIG. 13, the control signalsapplied to the 2 to 1 multiplexer 350 of the switches associated withthe second and fourth input/output ports 305, 309 can always be thesame. Similarly, the control signals applied to the 2 to 1 multiplexers350 associated with the first and third input/output ports 303, 307 mayalso always be the same and can always be the inversion of the controlsignal applied to the other two 2 to 1 multiplexers associated with thesecond and fourth input/output ports. Thus, only a single externalcontrol signal SO is required to control all 2 to 1 multiplexers 350,this single control signal being split locally for the 2 to 1multiplexers 350 associated with the second and fourth input/outputports 305, 309, and locally inverted and split to control the other two2 to 1 multiplexers 350 associated with the first and third input/outputports 303, 307 (or vice versa). Therefore, this implementation reducesthe number of externally applied control signals by three.

[0102] The embodiment of the switching element 301 shown in FIG. 13 isarranged such that the control signal, REN, applied to one of thetri-stateable inverters 356 of the second and fourth switches 313, 317is always the inversion of the signal applied to the other tri-stateableinverter. Likewise, the switching element 301 may be implemented suchthat the control signal, UEN, applied to one of the tri-stateableinverters of the first and third switches 311, 315 is always theinversion of that applied to the other tri-stateable inverter. In thisway, the control signal for controlling the tri-stateable invertersassociated with the second and fourth input/output ports may be invertedlocally, and the non-inverted signal applied to one tri-stateableinverter, and the inverted signal applied to the other. This enables thenumber of required control signals to be reduced by one. Similarly, asingle control signal for controlling the tri-stateable inverters 350associated with the first and third input/output ports may also beinverted locally, and the non-inverted signal applied to one of thethese two tri-stateable inverters, and the inverted signal applied tothe other. This enables the number of required externally appliedcontrol signals to be reduced by a further one, so that the total numberof external control signals is reduced by a total of five from eleven tosix. The control signals required to configure the switching element 301to transfer data between one switching element and another in any of theeight possible directions is given in Table 2, of FIG. 15. This tableincludes a by-pass mode, in which data is transferred locally, forexample from the local memory to the local processor element.

[0103] Data Transfer Functions

[0104] Embodiments of the switching element may be arranged fordifferent data transfer capabilities, examples of which will now bedescribed with reference to FIGS. 16 and 17.

[0105] In one implementation, each switching element may be capable oftransferring data from its local memory space to the input of aprocessing element associated with a neighbouring switching element.Referring to FIG. 16, this implementation is provided by coupling theoutput of the local memory space to the input port 325, and by couplingthe output port 329 to an input of the local processor element. In asecond implementation, each switching element may be arranged to permitthe transfer of data from a register (or arithmetic logic unit (ALU)output) to the input of a processor element (e.g. register or ALU)associated with a neighbouring switching element. Again, referring toFIG. 16, this functionality may be implemented by coupling the input 325of each switching element to an output of its local processor element(e.g. register or ALU output), and coupling the output port 329 of theswitching element to an input (e.g. register input or ALU input) of itslocal processor element.

[0106] In a third implementation, each switching element may be arrangedto enable the transfer of data from a processor element associated witha neighbouring switching element to its local memory. Referring again toFIG. 16, this functionality may be implemented by coupling the input 325of the switching element to an output (e.g. register or ALU) of itslocal processor element, and coupling the output 329 of each switchingto an input of its local memory.

[0107] In a fourth implementation, each switching element may be adaptedfor transferring data either from its local memory space or from anoutput from its local processor element (e.g. register or ALU output) toa processor element associated with a neighbouring switching element(i.e. a combination of the first and second implementations describedabove). Referring again to FIG. 16, this functionality may beimplemented by providing a 2 to 1 selector switch (multiplexer) 302which is switchable between two inputs 304, 306 and coupling the localmemory to one of the inputs 304 and an output of the local processorelement to the other input 306. The output 308 of the selector switch302 is coupled to the input 325 of the switching element, and the output329 of the switching element is connected to an input of its localprocessor element.

[0108] In a fifth implementation, each switching element may be arrangedto transfer data from its local memory space to a PE associated with aneighbouring switching element or to transfer data from a PE associatedwith a neighbouring switching element to its local memory space (whichis a combination of the first and third implementations describedabove). Referring to FIG. 17, this functionality may be implemented byproviding a 2 to 1 selector switch 302 having switchable inputs 304,306, one connected to the local memory and the other connected to anoutput of the local processor element, as for the embodiment shown inFIG. 16 and connecting the output 308 of the selector switch 302 to theinput 325 of the switching element. This arrangement enables dataderived locally either from the local memory or local processor elementto be transferred out of the switching element to a neighbouringswitching element. The output port 329 of the switching element isarranged to be coupleable to an input of its local PE or an input of itslocal memory. Referring to FIG. 17, the output 329 of the switchingelement is connected to one of the two input ports of a second 2 to 1output selector switch 341, the output of which is connected to an inputof the local processor element. The other input is connected to thelocal memory, for example via the input port 325 of the switchingelement to enable transfer from local memory to local processor element.The output port 329 is also coupled to one of the two inputs of a third2 to 1 output selector switch 342 whose output is connected to the localmemory, to enable incoming data from a neighbouring switching element tobe written to the local memory. The second input of the third selectorswitch 342 is connected to an output of the local processor element toenable data output from the local PE to be transferred directly to thelocal memory.

[0109] It is to be noted that, in addition to providing a read functionfrom a local memory to a processor element of a neighbouring switchingelement, and a write function from the local memory of a neighbouringswitching element to its local memory, the circuit of FIG. 17 alsoenables the transfer of data between a local processor element and aneighbouring processor element. This functionality may be regarded as acombination of the first, second and third implementations.

[0110] In a seventh implementation, each switching element may bearranged for selectively, either transferring data from its localprocessor element to the processor element of a neighbouring switchingelement or to transfer data from its local processor element to thememory associated with a neighbouring switching element. Referring againto FIG. 17, this functionality may be implemented by coupling an outputof a local processor element (e.g. register or ALU) to the input 325 ofthe switching element, and selectively coupling the output of theswitching element to one of the local processor element and the localmemory, which may for example be achieved by the same output circuitarrangement shown in FIG. 17.

[0111]FIG. 18 shows another embodiment of a switching element which iscapable of transferring data between its nearest eight neighbours, inthe row, column and diagonal directions of a two dimensional array. Theswitching element 401 has first, second, third and fourth input/outputports 403 (N), 405 (E), 407 (S) and 409 (W), each of which may beconnected to one port of a similar switching element of its nearest fourneighbours. These input/output ports have a similar function to theinput/output ports of the embodiments of the switching elementsdescribed above. The switching element 401 also includes an input 425,which may for example be coupled to either one or both of a local memoryand local processor associated with the switching element, and an output429, which may be coupled to one or both of a local memory and localprocessor element associated with the switching element.

[0112] The switching element 401 has a first group 431 of four switchesL, J, K and I which enable data to be transferred from one input/outputport to an adjacent input/output port when the switching element isplaced in by-pass mode. The switching element 401 further includes asecond group 433 of switches P, O, C and M, a respective one of which isswitchable to a respective pair of switches of a third group 435 ofswitches containing four switch pairs XA, VU, TB and RQ. The first twopairs of switches in the third group, XA and VU determine which of thefirst and second input/output ports couples to the input 425 and theoutput 429. Similarly, the third and fourth switch pair of the thirdgroup TB, RQ determine which of the third and fourth input/output ports407, 409 couple to the input port 425 and output port 429. A fourthgroup 437 of four switches Z, Y, {overscore (Z)} and {overscore (Y)} areconnected such that the first switch Z is connected to the terminal ofthe first switch in each of the first and second switch pairs of thethird group of switches X, V, and the second switch Y of the fourthgroup is connected to the switch terminals of the second switch of thefirst and second switch pairs A, U. Thus, the first and second switches{overscore (Z)}, {overscore (Y)} of the fourth group 437 together withthe first two switch pairs of the third group 435 selectively coupleeither the first and second input/output ports 403, 405 to either theinput port 425 or the output port 429. Similarly, the third switch Z ofthe fourth group 437 of switches is connected to the terminals of thefirst switches T, R in each of the third and fourth switch pairs of thethird group of switches, and the fourth switch Y of the fourth group isconnected to each of the terminals of the second switches B, Q of thethird and fourth switch pairs. The third and fourth switches Z, Y of thefourth group, in conjunction with the third and fourth switch pairs ofthe third group 435 selectively couple the third or fourth input/outputport 407, 409 to one of the input and output ports 425, 429. An inputaccess switch 439 is provided between the input 425 and the terminals ofthe first and third switches Z, {overscore (Z)} of the fourth group toswitchably couple the input 425 for an inter-switching element transfer,and an output access switch 441 coupled to the terminals of the secondand fourth switches Y, {overscore (Y)} of the fourth group of switchesfor coupling the output of the switching element for inter-switchingelement transfer. A local access switch 443 is also provided for directlocal access between the input 425 and the output 429, for example toenable direct local transfer between a local memory and a localprocessor element. It is to be noted that in operation, when the localaccess switch 443 is closed, both input and output access switches 431,433 are open, and vice versa.

[0113] One of the main differences between this embodiment and theembodiments described previously, is that the input and output 425, 429are switchably connected through a local access switch 443, which maycomprise a toggle switch, instead of a 2 to 1 by-pass multiplexer. Asecond difference between the switching element structure of FIG. 18 andthat of the previously described embodiments, is that the input 425 iscoupleable to each of the first, second, third and fourth input/outputports through a branched communication network rather than directlythrough a switch.

[0114]FIG. 19 shows a switching element according to another embodimentof the present invention. The switching element 402 is similar to thatshown in FIG. 18, and like parts are designated by the same referencenumerals. The main differences between this switching element 402 andthat shown in FIG. 18 are that the second group 433 of switches P, O, Cand M have been replaced by tri-state toggle switches P, Q, R and T,each of which has a first position which connects each of the first,second, fourth input/output ports 403, 405, 409 directly to the input425 of the switching element via an input access switch 439. Each of theswitches is switchable to a second position to place the switchingelement in one of four modes, in which a selected one of the fourinput/output ports 403, 405, 407, 409 is coupled to the output port 429.The switching element 402 further includes a third group 435 of switches404, 406, 408, 410, the first two of which select between the first andsecond input/output ports 403, 405 and the third and fourth of whichselect between the third and fourth input/output ports 407, 409. In thisembodiment, the second switch is controlled by the inverse controlsignal applied to the first switch, and the third and fourth switches408, 410 are controlled by the same control signals as the first andsecond switches 404, 406, so that the third group of switches has twooutputs, one of which is coupled to one of the first and secondinput/output ports 403, 405 and the other of which is connected to oneof the third and fourth input/output ports 407, 409.

[0115] The switching element 402 further includes a fourth group 437 oftwo switches 412, 414, which selectively couple one of the two outputsof the third group of switches to the output port 329 of the switchingelement via the output access switch 441. In this embodiment, the fourthgroup 437 of two switches may conveniently controlled by applying adesired control signal to one of the switches and the inverse of thecontrol signal to the other switch.

[0116] Returning to FIG. 3, a data processor apparatus according to anembodiment of the present invention, which may comprise a SIMD processorcomprises an array of computational units, each having access to anassociated local memory segment contained within a memory block. Thedata processor further includes an array of switching elements, whichmay comprise any switching elements described above, as well as others,which enables data to be transferred between one computational unitand/or its local memory and another computational unit and its localmemory. The array of computational units and the memory block may haveany size. However, the size of the memory block may be limited dependingon the application of the processor, since the access time for a read orwrite operation from or to a memory generally increases with memorysize. The power drawn by the memory for read and write operations alsoincreases with memory size. Memory access time may also depend on otherfactors such as the number of I/O ports. Therefore, a memory of a givensize will have a minimum access or latency period. Advantageously,embodiments of the present invention provide a data processor apparatusin which the effective memory size may be increased to that required bythe particular application, while retaining the desired memory accessspeeds and reducing power requirements. In one embodiment, this isachieved by selecting a memory block which provides the required memoryaccess speeds and incorporating into the data processor as many memoryblocks as necessary to provide the required memory size. An embodimentof a data processor having a plurality of memory blocks is shown in FIG.20.

[0117] Referring to FIG. 20, the data processor apparatus 501, which maycomprise a SIMD processor comprises a first, second, third and fourthmemory block 503, 505, 507, 509, each having a respective associatedgroup 511, 513, 515, 517 of computational units 519. Each computationalunit comprises one or more processor elements, and may comprise forexample 8, 16 or 32 processor elements for performing 8, 16 or 32 bitword processing. Each computational unit has an associated group ofswitching elements 521 which is arranged to provide data transferbetween the computational unit (or its associated memory) and at leastone other computational unit (or associated memory). In the presentembodiment, the computational units 519 of each group 511, 513, 515, 517are arranged in a two-dimensional array having two rows 523 (Row 1) and525 (Row 2) and n columns 527. The groups of switching elementsassociated with each row of computational units are also arranged inrows 527 (Row 1), 529 (Row 2), and each group of switching elements ineach row is connected to its adjacent group(s) of switching elements toprovide lateral or east to west, (or west to east) data transfer betweentheir associated computational units contained in each row 523, 525. Thetwo groups of switching elements in each column are also interconnectedto provide north to south and south north data transfer between the twocomputational units in each column, as indicated by the arrows 531.

[0118] In this embodiment, the first two-dimensional array ofcomputational units associated with memory block A 503 is effectivelyexpanded from an array containing two rows and n columns into an arraycontaining four rows and 2n columns by adding a second memory block B505 and second array of associated computational units 513 below thefirst memory block 503, and adjacent third and fourth memory blocks 507,509 and associated arrays of computational units 515, 517 to the side.Groups of memory elements in corresponding columns and contained in thesecond row associated with memory block A and the adjacent first rowassociated with memory block B are interconnected, as shown by thearrows 533, across memory block B to enable data to be transferredbetween the second row of computational units or their local memory,associated with memory block A and the first row of computational unitsor their local memory associated with memory block B.

[0119] The groups of switching elements contained in the nth columnassociated with memory blocks A and B are connected to the groups ofswitching elements of corresponding rows contained in the first columnof the two groups of switching elements associated with memory blocks Cand D, as indicated by the arrows 535, to enable data to be transferredbetween computational units (and/or their associated memory), containedin the nth column associated with memory blocks A and B, and theadjacent first column associated with memory blocks C and D. Thus, amodule containing a memory block of the required size and an associatedarray of computational units may be expanded into an enlarged integratedarray, by adding any number of further modules simply by interconnectingthe groups of switching elements at the edges of adjacent modules, asshown in FIG. 20.

[0120] In designing a module, the size of memory segment per I/O port,and the number of computational units required is generally determinedby the application that is to be run on the processor. For example, forMPEG4 decoding, the inventors have determined that four kilobits ofmemory per I/O port and an array of four by twenty-four eight bitcomputational units are required. The number of required rows in eachmemory depends on the column decoder selected. For example, using aneight to one column decoder (or multiplexer), a memory having 512 rowsis required to obtain 4 kb per I/O port (8×512=4,096 bits, 4 kb). Havingdetermined the number of rows and the required array size, the number ofI/Os in a memory can then be determined.

[0121] For example, if the processor runs at 100 MHz, a memory read mustbe completed in 10 ns. The inventors have found that, after simulation,a memory with 192 I/Os and 512 rows deep has a latency period of 4 ns.Each switching element may have a latency of for example of 1.5 ns, andtherefore the latency period of a north to south or east to westtransfer, which requires two switching elements, is about 3.0 ns, andthe latency period of a diagonal (i.e. NE, SE, SW, NW) read, whichrequires three switching elements, is about 4.5 ns. Adding the memorylatency period of 4 ns to the worst case switching element latency (i.e.a diagonal read), data can be transferred to a neighbouringcomputational unit with at least 1.5 ns to spare.

[0122] Thus, by adopting a processor architecture in which the requiredsize of memory is provided by a plurality of memory blocks, theperformance of the processor can be optimized for the application, andat the same time scaled to any size.

[0123] A data processor according to another embodiment of the presentinvention comprises a plurality of processor elements associated with amemory segment. In one embodiment, each processor element is coupled tothe same memory port (for at least one of read and write memory access).An input port of each processor element may be coupled to a common readline, which is coupled to a memory output port and/or an output port ofeach processor element may be coupled to a common write line, which iscoupled to a memory input port, which may be associated with the same ordifferent memory segment as the input port. The common read access lineand/or the common write line to which the processor elements areconnected, may be coupled to a respective memory input and output portthrough a switching element, which enables other processor elementsassociated with one or more different memory segments accessed to one ormore of the processor elements and their associated memory segments.

[0124] The data processor may include at least one register for eachprocessor element, each coupled to a common read access line, each ofwhich may be selectively enabled to receive data from the common accessline in order to control which processor element has access to thememory segment at any one time. The input registers may be controlled inconjunction with the memory decoders such that either the same ordifferent data from the memory is read into different processorelements. An embodiment of a data processor having multiple processorelements associated with a memory segment is shown in FIG. 21.

[0125] Referring to FIG. 21, a data processor 601 comprises a memory 603(for example a RAM) having a plurality of columns 607 of storageelements. The memory may have a plurality of segments 609 (only one ofwhich is shown in FIG. 21), each segment having a plurality of columnsof storage elements connected to a column decoder 611, which has amemory port 613 (e.g. an I/0 port, which may include separate terminalsfor read and write access or a common read/access terminal). The memory603 also includes a row decoder 615 for selecting a row of storageelements for read and/or write access. The data processor 601 comprisesa plurality of processor elements 617, 619, 621 each having an inputport 623, 625, 627 coupled to a common read access line 629, which iscoupled to the memory port 613 via an optional switching element 631which may be switchable to couple either the local read access line 629to the local memory port 613, or to connect one or more other processorelements and/or their associated memory segments to the local memorysegment 609 or to the local processor elements 617, 619, 621.

[0126] Each processor element 617, 619, 621 has at least one and in thepresent embodiment two input registers 633, 635 coupled to a respectiveprocessor input port 623, 625, 627 for temporarily storing data outputfrom the memory before being processed by the respective processorelement. Data write access to each register to enable or disable eachregister from receiving data on the common read access line 629 iscontrolled by a control signal applied to a control input port 637. Eachregister is individually controllable by a control signal. The controlsignals may be applied by any suitable device, for example a memoryaccess controller (MAC, not shown). Although in another embodiment,simple switches may be used to selectively couple the input port of eachprocessor element to the common read line, the provision of one or moreregisters enables a processor element to operate on the data containedin its register(s), while another processor element performs a memoryaccess and data is downloaded into its register(s). The provision of twoor more registers for one or more processor elements enables the or eachprocessor elements to process data stored in one register and at thesame time make a memory access so that data from the memory isdownloaded into the second register (or one of its other registers, ifthe processor element has more than two registers) so that data to beprocessed is pre-fetched. In this way, each register of a PE may bealternately selected for (1) receiving data from the memory and (2) foroutputting data to be processed by the PE, for example to its ALU.

[0127] In the present embodiment, the memory port 613 is switchablycoupled to another line 639 via a switch 641, which may for examplecomprise a tri-stateable inverter, and may be controlled by any suitabledevice, e.g. the same or different device used to control the registers633, 635. Advantageously, this arrangement enables the memory segment tobe accessed by another device, such as a data processor. The provisionof one or more registers 633, 635 coupled to the input port of eachprocessor element enables the processor elements to process data fromthe memory, while the memory is being accessed by another device, viathe other read access line 639.

[0128] The process or architecture of the embodiment of FIG. 21 offers adesigner a wide range of possible operating modes. For example, thepresent configuration allows a page mode memory access. In page mode, asingle row of memory (or segment of memory) is selected, and columnswithin the same memory segment are then successively selected by thecolumn decoder (column access storage (CAS) signal), while the rowaccess strobe (RAS) signal remains active. As each data bit appears onthe common read line 629, the input registers of the processor elementsmay be successively activated so that for example a bit from eachsuccessive memory column is written to a register of successiveprocessor elements. In one example, a byte of data may be stored in arow of storage elements, and the data processor may include eight ormore processor elements. Different bits of the byte may be read intodifferent processor elements, and the processor elements may be arrangedto perform a parallel operation on the byte word. A second byte, forexample from a second row of storage elements may also be read from thememory, for example using page mode, and different bits of the secondbyte also read into different processor elements, which may then performan operation on both bytes, for example by making a comparisoncalculation, or any other operation.

[0129] In this embodiment, each memory segment includes one or morecolumns of storage elements, and may include any number of processorelements, for example 2, 4, 16, 32, or any other number.

[0130] Returning to FIG. 21, the data processor 601 also includescircuitry to enable each processor to output data to its local memory,or optionally, another memory segment or another processor element,associated with that memory segment, through the switching element 631.

[0131] Each switching element 617, 619, 621 has an output port 645, 647,649 each coupled to a respective output line 651, 653, 655. The outputport of each processor element may be coupled to, for example, theoutput of an ALU, a register (e.g. internal register) or another elementof the processor element, or may be coupleable to any one of these. Theoutput circuit further includes a first output selector switch 657coupled to each of the output lines 651, 653, 655, which switchablyconnects one of the output lines to the output port 659 of the selectorswitch 657. The output 659 of the selector switch 657 may be coupled tothe switching element (SE) 631, which may be arranged to enable datafrom the processor elements to be written into their local memory 609,or into one or more other parts of the memory or into one or more otherprocessor elements. Alternatively, the output of the first selectorswitch 657 may be coupled directly to the local memory input port (or toanother memory input port). In the present embodiment, a second selectorswitch 661 is provided having two inputs 663, 665, one of which 663 iscoupled to the output port 659 of the first selector switch 657, and theother being connected to another line 667 which may be used to receivedata from another device. The second selector switch 661 is controlledby appropriate control signals applied (by any device, e.g. the arraycontroller) to its control input port 669 for connecting one of itsinput ports to its output port 671. The output port of the secondselector switch 661 may be connected to the switching element 631, ordirectly to the memory input port 613. (Alternatively, the output 671port may be connected to any other component, for example, of the dataprocessor, such as a memory segment of the same or different memory, ora different processor or other device.

[0132] An embodiment of a switching element which may be implemented asthe switching element 631 in the embodiment of FIG. 21 is shown in FIG.22. The switching element 631 has an input port 673, first and secondinput/output ports 675, 677 which is switchably coupled to the inputport 673 by a respective first and second switch 679, 681, and an outputport 683 which is switchably coupled to the input port 673, and thefirst and second input/output ports 675, 677 via a selector switch 685.The input port may be coupled to the local memory port (613 in FIG. 21),to one or more outputs of the processor elements (e.g. to one or more ofthe output lines 651, 653, 655 in FIG. 21, for example via the first orsecond selector switches 657, 661 in FIG. 21), or may be selectivelycoupled to either of the local memory or the output from one or moreprocessor elements.

[0133] The output port 683 of the switching element may be coupled tothe input port of one or more processor elements, and may, for example,be coupled to the common read access line 629 in FIG. 21 or may becoupled to the local memory input port (for example memory ports 613 inFIG. 21), or switchable coupled to either.

[0134] A series of switching elements, for example switching elements631, 685, 687 may be coupled together to enable the lateral transfer ofdata from one unit, comprising, for example a memory segment and aplurality of local processor elements, associated with one switchingelement, and another unit associated with another switching element.

[0135] The combination of switching elements which allows east to westand west to east data transfer in the east to west and west to eastdirections, and the interconnected processor elements associated witheach switching element enables data to be transferred between aprocessor element (or its associated memory) and any one of its eightneighbours (processor elements or their local memory) (or more remotely)in the north to south, south to north direction, the east to west andwest to east directions and both diagonal directions (i.e. Northeast toSouthwest and vice versa and Northwest to Southeast directions and viceversa).

[0136] Another embodiment of a data processor having an architecturecomprising a plurality of units, each having a memory segment and aplurality of processor elements associated with a memory segment, isshown in FIG. 23. The data processor 701 comprises a memory 703 having aplurality of column segments 705 which are divided into a plurality ofrow segments 707. Each column segment has an associated memory port 709to enable data to be at least one of read from and written to the memorysegment. The data processor further includes a plurality of processorelements 711, 713, 715, 717 associated with each column segment 705 ofthe memory, and which in this embodiment are arranged in column groups719, 721, 723 etc., although processor elements associated with aparticular memory segment may be arranged in any other configuration.Each group 719, 721, 723 of associated processor elements areinterconnected to a common read line and/or to a common output line, forexample as for the embodiment of FIG. 21. Each group of processorelements is coupled to the memory port of its respective memory segmentby a switching element 725, which may for example be similar to theembodiment of the switching element shown in FIG. 22. In one embodiment,the switching elements may be interconnected to enable the transfer ofdata between neighbouring groups of processor elements and theirassociated memory segments, in applications where the data processor isrequired to operate on single bit data. However, in the presentembodiment, adjacent switching elements are not interconnected, but areprovided for another interconnection scheme which allows multiple bitwords to be transferred between different elements of the dataprocessor, as will be described below.

[0137] In the present embodiment, each row (0 to n) of processorelements has a corresponding row segment or slice (0 to n) of memory. Inoperation, each row of processor elements may be arranged to read datafrom its corresponding memory slice one at a time and in any order. Forexample, the processing elements in row 0 may first be enabled to readfrom their row 0 memory slice, then the processor elements in row 1 maybe enabled to access their corresponding row 1 memory slice, (forexample in the next cycle), until all required rows of processorelements have accessed their corresponding row memory slice. Successivememory accesses by successive rows of processor elements may be referredto as pipelined accesses. In processing the data, a row of processingelements may begin processing the data once the data is received, orwait until one or more other rows of processing elements have alsoreceived data to be processed. In either case, data may be processed byeach row of processing elements at different times (i.e. in differentcycles) or simultaneously in the same cycle or cycles.

[0138] The embodiments of the data processor shown in FIGS. 21 and 23allow multiple processing elements to access the memory via a singleswitching element. This arrangement may improve the timing fortransferring data from a given memory space to a number of processorelements since the latency period is that associated with a singleswitching element, rather than multiple switching elements. Thisarrangement also reduces the time required for inter-processing elementdata transfer between associated processor elements (for example in thesame column), since the data can be transferred from any processorelement to any other associated processor element through a singleswitching element, in other words, by a left or right shift North/Southexcept does not require a switching element. In one embodiment, theoutput port or line of one or more processor elements may be switchablyconnected to the input port of any other processor element by by-passingthe switch as shown for example by the switch 691 between the outputline 651 of the first processing element and the common read line 629 ofthe embodiment shown in FIG. 21. The output or line of other processorelements may be similarly connected to the common read access line 629.

[0139] In one embodiment, each row of processor elements may be arrangedto perform parallel processing on multiple bit data, for example bytewide data or multiple byte wide data (e.g. containing 16, 32, 64 etc.bits). The processor elements may be arranged in groups of eight,thereby forming a computational unit capable of performing processingoperations on byte length data. To enable data bytes to be transferredbetween one computational unit and another, each switching element 725of the embodiment shown in FIG. 23 is connected to a correspondingswitching element of a neighbouring computational unit (either to theright or to the left or both) to enable bits of the same significance tobe transferred between computational units.

[0140] In other embodiments, the computational units may have any numberof processor elements to enable the computational units to process dataof any length, for example 16 bit data, 24 bit data, 32 bit data etc. Inanother embodiment, two or more computational units may be coupledtogether to enable variable length word processing. In this case, aswitching element may be switchably connected to two or more otherswitching elements for transferring bits of the same significance in adata transfer, depending on the length of the word. For example,corresponding switching elements of neighbouring eight bit computationalunits may be interconnected to enable the transfer of data bytes betweenneighbouring computational units. Each switching element of acomputational unit may also be switchably connected to an eight bitcomputational unit immediately adjacent its neighbouring computationalunit to enable 16 bit data transfers between pairs of computationalunits acting together as a 16 bit parallel processor. This arrangementmay be extended to enable data transfers between computational units ofwords of any desired length.

[0141] The processing capacity of the data processor described above andshown in FIG. 23 may be expanded to any desired size, by for exampleadding and interconnecting further blocks of memory and associatedcomputational units. An example of a data processor containing fourinterconnected memory blocks and associated computational units is shownin FIG. 24. Referring to FIG. 24, the data processor 801 has four memoryblocks 803, 805, 807, 809 each having an associated array ofcomputational units 811, 813, 815, 817. Each computational unit maycomprise one or more processor elements, for example eight processorelements for byte length word processing. The computational units arearranged in two-dimensional arrays of rows and columns. Each column 819of computational units has an associated switching element (if eachcomputational unit has only one processor element) or group of switchingelements (if each computational unit has more than one processorelement) 821. The switching element or group of switching elements 821are arranged in rows, adjacent switching elements or groups of switchingelements being coupled to the or each neighbouring switching element orgroup of switching elements to enable data to be transferred laterallybetween computational units in different (e.g. neighbouring) columns.The switching element or groups of switching elements in correspondingcolumns of the first and second array of computational units 811, 813are also connected to allow data to be transferred between computationalunits and/or their associated memory space of the first and secondunits, as indicated by the arrow 821, and the switching element or groupof switching elements of corresponding columns of computational unitassociated with the third and fourth array of computational units 815,817 are also interconnected as shown by the arrows 823, to allow data tobe transferred between the third and fourth units.

[0142] Although in the embodiment of FIG. 23, a single memory block maybe used to serve all of the arrays of computational units, it may bepreferable to provide a number of smaller, separate memory blocks togive the required memory space in order to reduce memory access timesand to provide other advantages as described in connection with theembodiments shown in FIG. 20.

[0143] Embodiments of the data processor apparatus may be formed on asingle die using any integrated circuit fabrication technique.Embodiments of the data processor apparatus are particularly applicableto SIMD processor operation and allows one or two-dimensional arrays ofdata of any size to be processed and parallel.

[0144] In other embodiments of the present invention, any one or moreelements described above in connection with one embodiment may beincorporated in any other embodiment or combined with any other elementor component.

[0145] Modifications to be the embodiments described above will beapparent skilled in the art.

1. A data processor apparatus, comprising: a first processor element anda second processor element, a memory having a first part and a secondpart, said first processor element being coupleable to said first partfor at least one of read and write access, said second processor elementbeing coupleable to said second part for at least one of read and writeaccess, and an access switch for selectively coupling said firstprocessing element to one of said first part, for at least one of readand write access, and said second part, for at least one of read andwrite access.
 2. A data processor apparatus as claimed in claim 1,further comprising a third processor element, and wherein said memoryfurther comprises a third part, said third processing element beingcoupleable to said third part for at least one of read and write access,and a second access switch for selectively coupling said secondprocessor element to one of said second part and said third part.
 3. Adata processor apparatus as claimed in claim 2, further comprising anaccess controller for controlling said first and second access switches,such that when said first switch couples said first processor element tosaid first part, said second switch couples said second processorelement to said second part, and when said first switch couples saidfirst processor element to said second part, said second switch couplessaid second processor element to said third part.
 4. A data processorapparatus as claimed in claim 3, wherein said second part is positionedbetween said first and second parts.
 5. A data processor element asclaimed in claim 4, wherein each of said first, second and third partseach contain a bit of different respective multiple bit words, whereineach bit has the same significance in the respective word.
 6. A dataprocessor apparatus as claimed in any preceding claim, wherein at leastone of the first and second parts comprises a column of memory elements.7. A data processor apparatus as claimed in any preceding claim, furthercomprising coupling means for coupling said first processor element tosaid second processor element to enable the transfer of data from atleast one of said first processor element to said second processorelement and from said second processor element to said first processorelement.
 8. A data processor apparatus as claimed in claim 7, whereinsaid first access switch is switchable to permit data to be transferredfrom at least one of said first processing element to said secondprocessing element, and from said second processing element to saidfirst processing element.
 9. A data processor apparatus as claimed inany preceding claim, wherein said first access switch comprises a firstport coupleable for receiving data from said first part, a second portcoupleable to said first port, a third port, and a fourth portcoupleable to said third port.
 10. A data processor apparatus as claimedin claim 9, wherein said second and third ports are selectivelycoupleable to said first and fourth ports.
 11. A data processorapparatus as claimed in claim 10, wherein first access switch isarranged such that when said first port is coupled to one of said secondand third port, said fourth port is coupled to the other of said secondand third port.
 12. A data processor apparatus as claimed in any one ofclaims 9 to 11, wherein said first port is coupleable to an output ofsaid first processor element.
 13. A data processor element as claimed inclaim 12, further including switching means for selectively switchingone of said first part and an output of said first processing element tosaid first port.
 14. A data processor element as claimed in any one ofclaims 9 to 13, wherein said fourth port is coupleable to at least oneof said first processor element and said first part.
 15. A dataprocessor element as claimed in claim 14, further comprising switchingmeans for selectively coupling said fourth port to one of said firstprocessor element and said first part.
 16. A data processor apparatuscomprising a plurality of processor elements, a memory having aplurality of parts, each different part being coupleable to a differentone of said plurality of processor elements, and switch means forswitch/coupling at least one of said processor elements from itsassociated memory part to the memory part associated with at least oneother processing element.
 17. A data processor apparatus as claimed inclaim 16, wherein said switch means comprises at least one switchingelement having a first port coupled to its associated memory part, afirst switch for switchably coupling said memory to its associatedprocessor element, a second port, and a second switch for switchablycoupling said memory to said second port.
 18. A data processor apparatusas claimed in claim 17, wherein said second switch is switchable betweensaid second port and an input terminal.
 19. A data processor apparatusas claimed in claim 17 or 18, wherein said switching element furthercomprises a third port and a third switch for switchably coupling saidthird port to said memory.
 20. A data processor apparatus as claimed inclaim 19, wherein said third switch is switchable between said thirdport and a second terminal.
 21. A data processor apparatus as claimed inclaim 20, further comprising a controller arranged such that when saidsecond switch couples said memory to said second port, said third switchcouples said third port either to said second input terminal, or isolatesaid third port from said input terminal and said memory.
 22. A dataprocessor apparatus as claimed in claim 20 or 21, comprising acontroller arranged such that when said third switch couples said memoryto said port, said second switch either couples said input terminal tosaid second port or isolates said second port from said memory and saidinput terminal.
 23. A switching element for switchably coupling an arrayof circuit elements each having an input and an output, the switchingelement comprising: an input for coupling to the output of a circuitelement, an output for coupling to an input of a circuit element, andfirst and second switch means, said first switch means having a firststate in which said first port is coupled to said input port, and asecond state in which said first port is coupled to said output port,and said second switch means having a first state in which said secondport is coupled to said input port and a second state in which saidsecond port is coupled to said output port.
 24. A switching element asclaimed in claim 23, wherein said first switch means has a third statein which said first port is isolated from said input port and saidoutput port.
 25. A switching element as claimed in claim 23 or claim 24,wherein said second switch means has a third state in which said secondport is isolated from said input port and output port.
 26. A switchingelement as claimed in any one of claims 23 to 25, further includingmeans for switchably coupling said input port to said output port.
 27. Aswitching element as claimed in any one of claims 23 to 26, furthercomprising a third port and third switch means having a first state forcoupling said third port to said input port and a second state forcoupling said third port to said output port.
 28. A switching element asclaimed in claim 27, wherein said third switch means has a third statein which said third port is isolated from said input port and saidoutput port.
 29. A switching element as claimed in any one of claims 23to 28, further comprising a fourth port, and fourth switch means havinga first state for coupling said fourth port to said input port and asecond state for coupling said fourth port to said output port.
 30. Aswitching element as claimed in claim 29, wherein said fourth switchmeans has a third state in which said fourth port is isolated from saidinput port and said output port.
 31. A switching element as claimed inclaim 29 or 30, wherein said first and second ports are spaced apartalong a first axis, and said second and third ports are spaced apartalong a second axis, wherein said second access traverses said firstaccess between said first and second ports.
 32. A switching element asclaimed in any one of claims 23 to 31, comprising selector means havinga plurality of inputs for selectively coupling one of said selectorinputs to said input port.
 33. A switching element as claimed in any oneof claims 23 to 32, further comprising selector means having a pluralityof output ports and arranged to couple said output port to one of saidselector output ports.
 34. A switching element as claimed in any one ofclaims 23 to 33, having a first input mode in which one of said firstand second switch means couples a respective one of the first and secondports to said output port, and a first output mode, in which the otherof said first and second switch means couples a respective one of thefirst and second ports to said input port.
 35. A switching element asclaimed in any one claims 29 to 34, having a second output mode in whichone of said third and fourth switch means couples a respective one ofsaid third and fourth ports to said input port, and a second input modein which the other of said third and fourth switch means couples theother respective one of said third and fourth ports to said output port.36. A switching element as claimed in any one of claims 29 to 35,wherein said first, second, third and fourth ports are arranged suchthat said first and second ports lie on a first axis and said third andfourth ports lie on a second axis transverse to said first access andbetween said first and second ports, and said switching element has athird input mode, in which one of said switch means couples a respectiveport to said input port, a third output mode, in which a port adjacentto the port coupled to the input port, is coupled to said output port,and a bypass mode, in which the other two ports are coupled.
 37. Aswitching element as claimed in claim 36, wherein at least one switchmeans includes a switch which is switchable to couple a respective portto said input port and another state in which said switch couples therespective port to an adjacent port.
 38. A switching element as claimedin any one of claims 23 to 37, comprising an inverter for inverting asignal, coupled between a respective port and said output port.
 39. Aswitching element as claimed in any one of claims 23 to 38, furthercomprising an inverter coupled between said input port and a respectiveoutput port.
 40. A switching element as claimed in any one of claims 23to 39, comprising first, second, third and fourth ports a first invertercoupled to each port, a selector switch associated with each port havinga first input coupled to the output of said inverter and a second inputport coupled to the input port of said switching element, a respectivesecond inverter coupled to the output port of said selector switch, theoutput port of a respective inverter being coupled to an adjacent port.41. A switching element as claimed in claim 40, wherein said secondinverter comprises a tri-stateable buffer.
 42. A data processorapparatus comprising a plurality of processor elements and a memoryhaving a plurality of segments, each containing at least one column ofstorage elements, and each segment having a memory port, and wherein atleast one memory port is coupled to at least two processor elements. 43.A data processor as claimed in claim 42, wherein each of said dataprocessor elements comprises a register having an input port coupled tosaid memory port.
 44. A data processor having a first memory block, afirst array of processor elements, each processor element being capableof accessing said first memory block, a first array of switchingelements each associated with a respective processor element of saidfirst array, a second memory block and a second array of processorelements, each being capable of accessing said second memory block, asecond array of switching elements each associated with a respectiveprocessor element of said second array, wherein a correspondingswitching element of said first array is coupled to a correspondingswitching element of said second array.
 45. A data processor apparatusas claimed in claim 44, wherein said second memory block is between saidfirst array of switching elements and said second array of switchingelements.
 46. A data processor comprising an array of circuit elementsand a switching element associated with each of said circuit elementssaid switching elements being interconnected to enable data to betransferred between said circuit elements.
 47. A data processor asclaimed in claim 46, wherein at least one of said switching elementscomprises a switching element as claimed in any one of claims 23 to 41.